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ABSTRACT 


The  first  part  of  this  report  presents  a  procedure 
for  processing  real  world  Image  sequences  produced 
by  relative  translational  motion  between  a  sensor 
and  environmental  objects.  In  this  procedure,  the 
determination  of  the  direction  of  sensor 
translation  Is  effectively  combined  with  the 
determination  of  the  displacements  of  image 
features  and  environmental  depth.  It  requires  no 
restrictions  on  the  direction  of  motion,  nor  the 
location  and  shape  of  environmental  objects.  It 
has  been  applied  successfully  to  real-world  Image 
sequences  from  several  different  task  domains 

In  the  second  part  we  extend  this  procedure  to  less 
restricted  cases  of  rigid  body  motion.  Part  of  the 
robustness  of  the  technique  Is  that  It  can  work 
with  reasonable  precision  even  when  applied  to  • 
small  Image  areas  containing  a  few  features.  This 
allows  more  general  image  motion  to  be  locally 
approximated  as  translations  of  small  areaa  In  the 
environment.  Given  such  an  approximation,  we  then 
show  how  to  recover  the  parameters  of  camera 
motion. 


I.  INTRODUCTION 


I. A.  Definitions 


An  Displacement  Vector  Is  a  two-dlmenalonal 
vector  describing  the  displacement  of  an  Image 
feature  from  one  Image  to  the  next.  An  Image 
D1 splacement  Field  la  the  set  of  Image  displacement 
vectors  For  successive  Images.  An  Image 
Displacement  Sequence  Indicates  the  positions  of  an 
Image  feature  over  several  successive  images. 
Though  wa  are  dealing  with  d lac rate  image 
sequences.  It  la  often  possible  to  deaolbe  the 
continuous  curve  along  which  an  Image  feature  point 
Is  moving.  This  curve  la  called  the  Image 
Dlaplaoaswnt  Path. 


Corresponding  to  Image  motions  we  have  a  set  of 
terms  for  deacrlblng  environmental  motions.  An 
Environmental  Displacement  Field  la  the  set  of 
three-dimensional  vectors  Indicating  the  positions 
of  environmental  points  at  successive  Instants.  An 
Environmental  Displacement  Sequence  Indicates  the 
position  of  an  environmental  point  over  several 
successive  lnstanta.  An  Environmental  Displacement 
Path  describes  the  three-dimensional  curve  that 
environmental  points  are  moving  along  for 
particular  motions. 

The  Environmental  Direction  of  Motion  Field  (EDNF) 
associates  with  each  Image  point  a  unit  vector 
describing  the  three  dimensional  direction  of 
motion  of  Its  corresponding  environmental  point. 
Note  that  for  a  particular  motion,  the  vectors  of 
the  EMF  approximate  the  tangents  of  the 
corresponding  environmental  points  along  their 
Environmental  D1 splacement  Paths 

I.B.  Coordinate  System 


Our  analysis  is  restricted  to  Image  sequences 
formed  by  a  sensor  moving  relative  to  a  stationary 
environment.  The  t-th  Image  of  an  Image  sequence 
Is  referred  to  as  I(t).  Motion  of  the  sensor  from 
one  Image  to  the  next  is  characterised  by  a  camera 
motion  parameter  vector  N(t),  whose  alx  dimensions 
describe  the  displacement  and  reorientation  of  the 
sensor  from  time  t  to  tel. 
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The  caawra  model  consists  of  a  planar  retina 
embedded  In  a  three-dimensional  Cartesian 
coordinate  system  (x,y,t),  with  the  origin  at  the 
focal  point  and  the  optical  axis  aligned  with  the 
s-axls  (figure  1).  The  x  and  y  axes  correspond  to 
the  gravitationally  Intuitive  horizontal  and 
vertloal  directions.  The  Image  plane  is  parallel 
to  the  xy  plane  and  at  some  distance  along  the  z 
axla.  Positions  In  the  Image  plane  are  described 
using  a  2-d  coordinate  system  aligned  with  the  x 
and  y  axes  of  the  camera  coordinate  system  and  with 
the  origin  determined  by  the  Intersection  of  the 


PlC  1.  Camera  Coordinate  System 


I.C.  Recovery  of  Camera  Motion  Paranetera 


There  are  5  paranetera  [PRA81 ]  that  can  be 
recovered  froa  processing  laage  notion  without 
knowing  absolute  eaaera  dlsplaceaent  or  velocity 
(since  absolute  depth  la  lost):  two  paranetera  for 
the  unit  vector  (T1(t),  T2(t))  which  describes  the 
sals  of  tranalatlonal  notion  at  tine  t;  two 
paraaMtera  for  the  unit  vector  (RKt),  R2(t) ) 
describing  the  axis  of  rotation  at  tine  t;  and  one 
paraneter  R3(t)  which  describes  the  extent  of 
rotation  about  the  axis  of  rotation  at  tine  t. 
Both  of  these  axes  are  positioned  at  the  origin  of 
the  aaners  coordinate  systen.  The  problan  of 
processing  inage  notion  resulting  fron  rigid  body 
eanera  notion  can  be  organised  Into  subcases  of 
Increasing  eonplexlty,  corresponding  to  the  niaaber 
of  eanera  notion  paranetera  that  are  unconstrained. 

II.  PROCESSING  TRANSLATIONAL  MOTION 


In  this  section,  we  begin  with  a  review  of  the 
properties  of  translational  dlsplaowent  fields  and 
an  overview  of  the  procedure  for  processing  then. 
This  Is  followed  by  a  nore  detailed  description  of 
the  components  of  the  procedure:  feature 
extraction,  error  neasure  conputatlon,  and 
optlnlsatlon.  He  then  present  sons  experlnental 
results  showing  the  effectiveness  of  the  method  and 
discuss  sons  extensions. 


II. A.  Translational  Motion  Properties 


For  purely  translational  notion,  the  lnage 
displacement  paths  are  deternlned  by  the 
Intersection  of  the  translational  axis  with  the 
lnage  plane.  If  the  translational  axis  Intersects 
the  lnage  plane  on  the  positive  half  of  the  axis, 
the  point  of  intersection  Is  called  a  Focus  of 
Expansion  (FOE)  and  the  lnage  notion  Is  along 
straight  lines  radiating  fron  It.  This  corresponds 
to  eanera  notion  towards  environmental  points.  If 
the  translational  axis  Intersects  the  lnage  plane 
on  the  negative  half  of  the  axis,  the  point  Is 
called  a  Focus  of  Contraction  (FOC)  and  the  lnage 
displacement  paths  are  along  straight  lines 
converging  towards  It.  This  corresponds  to  caters 
notion  away  fron  environmental  points.  The 
Intersections  of  axes  parallel  to  the  lnage  plane 
are  points  at  infinity  and  are  treated  as  FOEs. 

The  translational  axis  alone  does  not  coapletely 
determine  an  lnage  dlsplacenent  field.  It 
constrains  the  direction  of  notion  of  lnage 
features,  but  not  the  magnitude  of  their 
displacements,  which  are  a  simple  function  of  both 
feature  position  in  the  Image  and  the  depth  of  the 
corresponding  environmental  points. 

The  set  of  all  possible  translational  axes 
describes  a  unit  sphere  called  the  Translational 
Direction  Sphere.  The  procedures  below  are  defined 
with  respect  to  this  sphere,  rather  than  the  Image 
plane  Itself,  for  reasons  described  in  section 
II.  D.  5. 


II. B.  Overview 


Processing  translational  motion  consist^  of 
determining  the  axis  of  translation  and  finding  the 
extent  of  Image  feature  displacements  along  the 
paths  determined  by  the  corresponding  FOE  or  FOC. 
The  direction  of  camera  translation  from  an  Image 
sequence  Is  computed  In  two  basic  steps:  Feature 
Extraction  and  Search.  The  feature  extraction 
process  picks  out  small  lnage  areas  which 
potentially  correspond  to  distinguishing  parts  of 
environmental  objects.  The  search  process 
optlnlzes  an  error  neasure  which  reflects  the 
validity  of  a  hypothesized  translational  axis  by 
evaluating  the  matches  of  extracted  features  along 
the  lnage  displacement  paths  deternlned  by  the 
hypothesized  translational  axis.  The  search 
process  consists  of  two  basic  steps:  a  global 
sampling  of  the  error  neasure  to  deternlne  the 
rough  position  of  the  nlnimum  followed  by  a  search 
based  on  local  evaluation  of  the  error  measure 
gradient. 

The  procedure  requires  specification  of  1)  the 
feature  extraction  process;  2)  the  forn  and 
computation  of  the  error  neasure;  and  3)  the 
organization  of  the  search  process. 


II. C.  Feature  Extraction 


The  feature  extraction  process  Is  used  to  determine 
small  areas  (sometimes  called  image  points)  in  an 
image  that  are  distinct  from  neighboring  areas. 
This  distinctiveness  limits  the  likelihood  of 
matches  of  these  image  areas,  and  possibly  reflects 
a  correspondence  to  actual  and  significant  points 
in  the  environment,  such  as  points  of  high 
curvature  on  object  boundaries,  texture  elements, 
surface  markings,  etc.  (However  some  features, 
termed  false  features  will  result  from  noise, 
occlusion,  and  light  souree  effects  and  have 
behavior  which  is  difficult  to  analyze).  Features 
can  be  represented  as  arrays  of  nuabers  extracted 
directly  from  an  image  or  as  parameterized  tokens 
describing  local  image  properties.  In  this  paper, 
we  refer  to  features  exclusively  as  small  arrays  of 
data  values  centered  at  some  point  in  an  image  at 
some  time  t. 

Following  Moravec  [MOR77.MOR80],  the  method  of 
feature  extraction  used  here  is  based  upon  finding 
image  areas  which  are  significantly  different  than 
their  neighboring  areas.  Using  a  correlation 
measure  normalized  between  1  (for  perfect 
correlation)  and  0,  the  distinctiveness  of  a 
feature  is  1  minus  the  best  correlation  value 
obtained  when  the  feature  is  correlated  with 
respect  to  its  immediately  neighboring  areas. 
Selecting  good  features  then  requires  finding  the 
local  maxima  in  the  values  of  the  distinctiveness 
measure  over  an  image. 

He  have  extended  this  approach  somewhat  by 
constraining  the  neighborhoods  over  which  the 
features  are  selected  to  contours  determined  by 
other  global  processes  which  are  sensitive  to  image 
edges.  For  the  results  in  section  II. F.,  these 
contours  were  determined  using  zero-croasings. 


II. C. 1.  Feature  Extraction  Using  Zero-Crossings 


The  use  of  zero-crossings  to  determine  significant 
image  contours  at  different  levels  of  resolution 
has  been  proposed  and  extensively  studied  by  Harr 
et.  al.  [HIL80.HAR80].  In  this  processing  an 
image  is  convolved  with  Gausslan-Laplaclan  masks 
(del**2g)  of  different  positive  widths  and 
thresholded  at  zero  to  determine  zero-crossing 
contours.  These  contours  are  significant  since 
they  correspond  to  the  points  of  greatest  change  in 
the  convolved  image.  The  distinctiveness  measure 
can  be  applied  to  points  along  these  contours  in 
the  convolved  image  with  the  local  maxima 
determining  the  position  of  potential  features. 
This  generally  has  the  effect  of  finding  points  of 
high  curvature  along  the  zero-crossing  contour, 
although  points  corresponding  to  local  occlusion 
vertices  and  weak  maxima  will  also  be  extracted. 


Fig  2.  Curvature  Approximation 


Many  of  the  weak  features  can  be  removed  by 
suppressing  those  which  are  at  points  of  low 
curvature  along  the  zero-crossing  contours.  The 
curvature  of  a  feature  on  a  contour  is  approximated 
by  the  inner  product  of  the  normalized  vectors 
describing  the  relative  positions  of  the  features 
adjacent  to  it  along  the  contour.  These  values  are 
then  thresholded  between  1  (corresponding  to  high 
curvature)  and  -1  (corresponding  to  low  curvature) . 
(the  cosine  of  angle  alpha  in  figure  2) 

Use  of  zero-crossing-based  features  requires 
specification  of  the  sizes  of  the  convolution  masks 
that  are  employed  and  deciding  whether  to  position 
extracted  feature  points  with  respect  to  the 
unprocessed  image  or  the  convolved  images.  In 
general,  it  is  beneficial  to  use  masks  of  various 
positive  widths  for  sensitivity  to  features  at 
different  levels  of  resolution.  The  processing 
described  below  can  be  applied  Independently  to  the 
pairs  of  successive  Images  formed  by  convolving  the 
successive  Images  with  del*»2g  masks  of  different 
positive  widths.  Alternatively,  features  can  be 
extracted  from  the  original,  unflltered  image  at 
the  positions  where  features  were  determined  in  the 
y  convolved  images,  though  experience  with  large 
masks  has  shown  that  features  can  move  significant 
distances  from  where  a  person  would  generally  place 
them  with  respect  to  the  original  image. 


II.  D.  Error  Measure 


The  error  measure  is  used  to  evaluate  the  validity 
of  a  translational  axis  with  respect  to  successive 
Images.  It  reflects  the  quality  of  the  matches  of 
extracted  features  along  the  image  displacement 
paths  determined  by  a  potential  translational  axis. 
It  is  expected  that  most  features  will  have  their 
best  matches  along  the  image  displacement  paths 
determined  by  the  correct  translational  axis.  This 
will  tend  to  be  violated  by  false  features  and 
those  features  affected  by  occlusion. 
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For  exaaple,  a  sketch  of  several  of  the  Image 
displacement  paths  determined  by  the  Intersection 
of  a  potential  translational  axis  and  the  laage 
plane  Is  shown  for  a  set  of  extracted  features  In 
figure  3a.  If  the  hypothesized  translational  axis 
is  correct,  the  majority  of  features  will  tend  to 
have  good  aatehes  along  these  paths.  Figure  3b 
shows  the  aatch  profile  for  a  particular  feature 
along  Its  displaces ent  path  with  respect  to  the 
succeeding  laage.  The  units  of  dlsplacasent  are 
pixels. 


I  l  A(i.J)xB(l,j) 

_ 1J _ 

(((  l  1  A(i,J)*A(i,j))  +<  l  l  B(l,J)xB(i,J)))/2) 

1  J  1  j 

Normalized  Absolute  Value  Difference  (  3 ) 


l  I  aba(A(l.J)-B(l, j)) 
_ 

I  l  A(l.j)  +  l  l  B(i,j) 
.1  j  1  j 


All  of  these  measures  have  a  value  of  1  for  a 
perfect  aatch.  Of  these,  the  first  choice  is  the 
most  conventional,  the  second  a  good  approximation 
to  the  first,  and  the  third  Is  the  quickest  to 
evaluate . 


Fig  3.  Constrained  Feature  Displacements 


II.  D.  2. 


station  Process 


The  development  of  an  error  measure  requires  a 
measure  for  the  degree  of  aatch  between  features 
and  an  Interpolation  process  for  determining 
positions  along  an  Image  displacement  path.  Each 
of  these  can  be  Implemented  In  various  ways  with 
the  choices  generally  Involving  a  trade-off  between 
the  speed  of  evaluating  the  error  measure  and  the 
precision  with  which  the  translational  axis  can  be 
determined. 


II. D. 1.  Match  Metric 


There  are  several  aetrics  for  similarity  of  nxn 
pixel  features  of  the  fora  A(l,j)  and  B(1,J),  where 
1  ranges  from  1  to  n  and  J.  ranges  from  1  to  n.  We 
have  utilized: 


The  interpolation  process  approximates  the 
potential  displacements  of  a  feature  from  an 
Initial  image  Into  a  succeeding  laage.  Depending 
on  the  accuracy  required,  positions  along  the  Image 
displacement  path  can  be  approximated  a)  roughly  by 
setting  the  coordinates  of  the  feature's  position 
to  the  nearest  Integer  value;  or  b)  more 
accurately  by  performing  a  subpixel  Interpolation 
of  the  feature  at  each  of  a  set  of  seleoted 
positions  along  the  Image  displacement  path  with 
respect  to  the  succeeding  Image.  The  basic 
trade-off  Is  between  speed  and  accuracy,  with 
subpixel  interpolation  being  a  more  expensive 
computation. 


II.  D.  3.  Error  Measure 


II.  D.  5. 
Sphere 


of  the  Direction  of  Translation 


The  error  Measure  associates  with  a  point  on  the 
direction  of  translation  sphere  a  value  describing 
the  quality  of  lMage  feature  Matches  along  the 
lMage  displacement  paths  determined  by  the 
corresponding  translational  axis.  This  value  is 
computed  by  determining  the  best  Match  for  each 
feature  along  the  image  displacement  path 
determined  by  the  hypothesized  translational  axis 
and  then  summing  the  normalized  error  values  (using 
one  of  the  metrics  above)  for  all  of  the  image 
feature  points.  Thus  for  a  set  of  N  features  in  an 
initial  image,  a  hypothesized  translational  axis, 
and  use  of  one  of  the  match  metrics  above,  the 
error  measure  is 


!  <1.0 

i-i 


bestawtch(l)) 


where  bestmatch(l)  is  the  best  match  value 
associated  with  feature  1  along  the  image 
displacement  path  determined  for  it  by  a 
translational  axis. 


II.D.K.  Properties  of  the  Error  Measure 


The  error  measure  should  have  a  distinct  global 
minimus  at  the  point  on  the  unit  sphere 
corresponding  to  the  correct  translational  axis. 
It  is  expected  to  be  well  behaved  globally  because 
it  is  very  unlikely  that  translational  axes  that 
are  far  from  the  correct  position  will  define  image 
displacement  paths  that  simultaneously  allow  good 
matches  for  many  features.  Thus,  we  do  not  expect 
competing  candidates  for  the  global  minimum  to  be 
widely  separated,  and  the  experiments  wo  have 
performed  confirm  this  expectation. 

The  error  measure  will  be  affected  by  both 
non-dlstlnotlve  and  false  features. 
Non-dlstlnotlve  features  will  match  well  for  many 
different  translational  axes.  Large  numbers  of 
these  weak  features  will  flatten  the  response  of 
the  error  measure.  False  features  will  also 
distort  the  error  measure  since  they  will  often 
have  their  best  matches  with  Incorrect 
translational  axes. 

The  effects  of  these  poor  features  should  be 
compensated  by  the  agreement  of  good  features. 
Every  one  of  the  good  features  will  tend  to  have  a 
bad  match  for  the  Incorrect  FOE  and  their  unanimity 
is  expected  to  overlde  the  lack  of  discrimination 
of  weak  features  and  the  random  quality  of  the 
matches  of  false  features. 


There  are  significant  advantages  in  defining  the 
error  measure  with  respect  to  a  unit  sphere. 
Instead  of  the  potential  positions  of  FOEs  and  FOCs 
in  the  image  plane.  The  sphere  is  a  bounded 
surface  which  makes  uniform  global  sampling  of  the 
error  measure  feasible.  Additionally,  the 
resolution  in  the  position  of  the  translational 
axis  varies  accross  the  surface  of  the  image  plane. 
For  example,  the  FOEs  determined  by  translational 
axes  seperated  by  very  small  angles  will  be 
separated  by  larger  and  larger  distances  in  the 
plane  as  the  intersections  of  the  translational 
axes  and  the  image  plane  are  placed  further  from 
the  visible  image.  The  effect  of  this  on  the  error 
measure,  when  it  is  defined  over  the  image  plane, 
is  large  flat  areas  for  FOEs  further  from  the 
visible  portions  of  the  image.  Finally,  special 
criteria  must  be  used  to  distinguish  between  FOEs 
and  FOCs  if  the  error  measure  is  defined  relative 
to  the  image  plane.  Roughly  parallel  image 
displacement  veotors  could  correspond  to  an  FOE  off 
to  one  side  of  the  image  plane  or  to  an  FOC  off  to 
the  opposite  side.  On  the  direction  of  translation 
sphere,  the  corresponding  translational  axes  would 
be  close  while  on  the  plane  they  are  completely 
separated. 


Search  Organization 


The  search  process  used  here  consists  of  two 
phases:  A  global  sampling  of  the  error  measure  to 
determine  its  rough  shape  followed  by  a  local 
search  to  determine  the  minimum.  The  local  search 
is  initialized  at  the  position  where  the  minimum 
value  was  determined  by  the  global  sampling.  The 
procedure  used  for  the  local  search  is  steepest 
descent  with  a  diminishing  step-size.  That  is,  the 
steepest  descent  procedure  begins  with  a  initial 
fixed  step  size  and  determines  a  local  minimum 
using  it.  The  step-size  is  then  reduced  and  the 
prodedure  repeated  until  the  step-size  is  at  the 
desired  resolution  for  the  determination  of  the 
translational  axis.  In  the  experiments  below  the 
initial  step-size  was  set  to  0.1  and  then  reduced 
to  0.025  and  0.005  radians. 

The  form  of  the  error  funotlon  for  several 
different  translational  sequences  is  smooth,  with  a 
single  minimus  in  a  large  neighborhood  around  the 
correct  translational  axis.  Thus,  the  global 
sampling  could  be  quite  sparse  or  the  initial  step 
size  of  the  local  search  quite  large. 
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The  global  search  used  the  absolute  value  norm  and 
nearest  Integer  Interpolation.  The  sampling 
Increment  corresponded  to  the  vectors  on  the 
direction  of  motion  sphere  being  separated  by 
. 31*157  radians  froai  each  other.  Maximal  Image 
displacements  along  the  hypothesixed  image 
displacement  paths  was  set  to  10  pixels.  Features 
uere  centered  at  the  positions  shown  in  figure  5e. 
The  global  sampling  determined  a  minimum  in  the 
error  function  at  the  unit  vector  (-.80902. 
-.*755*.  .3*5*8)  on  the  direction  of  translation 
sphere. 


The  local  search  used  the  Moravec  norm  and 
bl-llnear  Interpolation.  The  determined 
translational  axis  was  (-.83738.  -. *20*3.  .3*933). 
The  displacements  of  the  feature  points  from  figure 
5b  for  this  translational  axis  are  shown  in  figure 
6. 


Fig  7.  Image  Subarea 


Given  the  direction  of  translation  and  image 
displacements,  relative  environmental  depth  can  be 
recovered  by  the  simple  relation  [LEE80] 


Fig  6.  Image  Displacements 


The  procedure  was  repeated,  but  using  features  at 
the  positions  from  figure  5b  (those  prior  to  low 
curvature  suppression).  This  has  the  effect  of 
introducing  weak  and  false  features  into  the 
computation.  The  translational  axis  extracted  was 
(-.82909.  -.*2281.  .36585)  This  la  a  difference  of 
0.01863  radlana  or  1.06765  degrees  from  that 
determined  using  the  features  indicated  in  figure 
5e. 

The  procedure  was  also  applied  using  the  features 
from  the  restricted  subarea  shown  in  figure  7, 
corresponding  to  sooie  faint  tree  texture#  Using 
these  features,  the  translational  axis  extracted 
was  (-.8*281,  -.*2928.  .32*65).  This  is  a 
difference  of  0.02677  radians  or  1.53*18  degrees 
with  the  translational  axis  determined  using  the 
feature  centered  at  the  positions  Indicated  in 
figure  5c. 


where  Z  is  the  value  of  the  Z  component  of  an 
environmental  point  at  time  t+1,  delta  Z  is  the 
extent  of  environmental  displacement  along  the  Z 
axis  from  time  t  to  time  t+1,  D  is  the  distance  of 
the  corresponding  image  point  from  the  FOE  or  FOC 
at  time  t.  and  delta  D  is  the  image  point's 
displacement  from  time  t  to  time  t+1.  Z  can  be 
recovered  in  units  of  Delta  Z  without  knowledge  of 
the  actual  extent  of  camera  displacement.  When 
Delta  D  is  small,  the  inferred  depth  values  can  be 
quite  erratic  due  to  sensitivity  to  small  numbers 
in  the  denominator  in  the  left  hand  side  of 
equation  5.  For  this  reason,  it  is  useful  to  keep 
track  of  the  image  displacements  over  several 
successive  Images  with  concurrent  updating  of  the 
inferred  depth  values.  This  was  done  using  a 
sequence  of  four  successive  images  of  the  roadslgn. 
In  this  processing,  the  position  of  the 
translational  axis  determined  from  Images  I(t)  and 
I(t+1)  was  used  as  the  initial  value  in  the  local 
searoh  for  determing  the  translational  axis  for 
Images  I(t+1)  and  I(t+2). 

Given  the  image  displacements  determined  from  1(1) 
to  !(*)  of  the  sequence,  the  depth  values  for  image 
points  along  the  contour  in  figure  5a  were  computed 
using  equation  5.  This  sequence  is  especially  nice 
for  presenting  depth  processing  results  since  the 
three  environmental  objects  in  the  images  are  at 
three  distinct  depths.  This  is  shown  in  figure  8a 
by  the  three  distinct  clusters  in  the  histogram  of 
the  depth  values  calculated  for  the  points  along 
the  contour  in  figure  5a.  Mapping  feature  labels 
from  these  clusters  back  onto  contour  points  from 
figure  5a  yields:  the  boundary  shown  in  figure  8b 
(the  sign),  the  boundary  shown  in  figure  8c  (the 
pole),  the  boundary  segment  shown  in  figure  8d  (the 
trees) .  Points  in  a  10  pixel  wide  margin  along  the 
boundary  of  1(1)  were  Ignored  since  the  processing 
did  not  take  into  account  occluslon/dlsocclusion 
effects  along  the  image  boundaries. 
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Fig  8a.  Depth  Hlstograa  (Z  component) 


Fig  8b.  Sign  Segments 
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Fig  8c.  Pole  Segments 


Fig  8d.  Tree  Segments 


II. G.  Summary  and  Extensions 


This  work  demonstrates  a  simple  and  robust 
procedure  for  determining  the  direction  of 
environmental  motion  and  Image  displacements  In 
real-world  Image  sequences  produced  by  observer 
translation.  It  is  not  dependent  on  an  initial 
matching  process  prior  to  the  Inference  of  camera 
motion.  Instead,  features  are  extracted  from  an 
Initial  Image  and  their  displacements  are 
determined  concurrently  with  the  inference  of 
direction  of  sensor  motion.  Thus  complications  In 
matching  that  arise  from  an  Individual  feature 
being  extracted  in  one  image  and  not  In  the  next 
are  reduced.  The  process  Is  also  relatively 
Insensitive  to  weak  and  false  features.  It  has 
been  successfully  applied  to  Image  sequences 
produced  by  a  car  translating  down  a  road,  by  a 
camera  attached  to  a  robot  manipulator  In  an 
Industrial  environment,  and  to  artificially 
generated  sequences.  We  now  consider  some 
extensions. 


II. 6.1.  Other  Cases  of  Restricted  Motion 


The  procedure  developed  In  this  paper  should  be 
applicable  to  other  cases  of  unknown  but  restricted 
camera  motions  for  which  It  Is  computationally 
feasible  to  search  directly  through  a  subspace  of 
the  camera  motion  parameters.  Two  particular  cases 
are  pure  sensor  rotation  and  motion  constrained  to 
a  known  plane. 

With  pure  sensor  rotation,  the  unknown  camera 
parameters  are  constrained  to  R1(t),  R2(t),  and 
R  3( t) .  In  this  case,  the  error  measure  from 
section  II. D. 3.  would  be  defined  with  respect  to 
the  direction  of  rotation  sphere  where  each  point 
corresponds  to  an  axis  of  rotation.  For  each 
rotational  axis,  the  extent  of  displacement  for 
image  features  Is  determined  by  different  values  of 
R3(t).  There  is  the  additional  constraint  In  the 
rotational  case  that  the  displacements  of  all 
features  must  correspond  to  the  same  value  of 
R3<t). 
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For  motion  constrained  to  a  known  plane,  the 
rotational  axis  Is  known  to  be  perpendicular  to 
that  plane  and  the  translational  axis  Is 
constrained  to  lie  In  that  plane.  Thus,  only  R3(t) 
and  one  translational  parameter  can  vary  and  the 
error  measure  can  be  computed  with  repect  to  these 
two  parameters.  The  global  sampling  in  this  case 
amounts  to  evaluating  a  set  of  translational  axes 
for  each  of  a  set  of  potential  rotations. 


II.G.2.  Multiple  Independently  Moving  Objects 


The  processing  here  has  been  limited  to  a  oamera 
moving  relative  to  a  stationary  environment,  or  a 
stationary  camera  with  a  stationary  background  and 
a  single  moving  object.  A  useful  extension  would 
allow  for  several  independently  moving  objects  with 
different  directions  of  translation.  The  technique 
of  summation  of  errors  in  feature  matching  only 
allows  a  single  axis  of  translation  to  be 
determined  and  will  cause  the  analysis  of  the 
several  objects  in  independent  motion  to  be 
confounded. 

One  approach  is  to  segment  an  image  into  regions 
which  potentially  correspond  to  objects,  or  to 
arbitrarily  divide  the  image  into  regular 
overlapping  subimages  and  perform  the  translational 
analysis  for  each  region  or  subimage  independently 
[WIL80, NAGI791.  Experiments  have  shown  it  is 
possible  to  work  with  small  image  areas,  at  a  size 
comparable  to  extracted  regions  or  subimage  areas, 
and  still  determine  the  axis  of  translation  with  a 
reasonable  level  of  precision.  If  objects  with 
similar  translations  correspond  to  several 
different  regions  or  Image  subareas,  then  similar 
translational  axes  will  be  determined  for  these 
regions  or  subimages.  If  objects  with  different 
translations  correspond  to  the  same  regions  or 
subimages  then  there  will  be  poor,  indistinct  error 
values  for  the  error  function.  For  this  second 
case,  it  is  necessary  to  resegment  and  redetermine 
a  translational  axis. 


II. G. 3.  Stabilized  Retina 


Translational  processing  Is  sufficient  for 
vision-based  navigation  in  a  stationary  environment 
if  the  orientation  of  the  optic  sensor  can  be  fixed 
relative  to  the  environment  over  time.  In  this 
case,  sensor  motion  amounts  to  a  sequence  of 
translations  in  possibly  different  directions  over 
time. 


A  difficulty  with  such  a  stabilized  retina  is  that 
much  of  the  environment  would  not  be  observable. 
This  can  be  corrected  by  using  a  set  of  such 
stabilized  retinas  arranged  to  yield  a  complete 
view  of  space.  There  would  then  be  no  need  to 
rotate  the  sensor  to  view  a  particular 
environmental  point.  A  possible  arrangement  of 
retinal  surfaces  is  a  cubical  one.  One  of  the 
retinal  planes  will  always  contain  an  FOE  and 
another  will  always  contain  an  FOC  (unless  the 
direction  of  motion  is  right  on  an  edge  of  the  cube 
and  the  focal  length  has  not  been  properly 
adjusted).  There  will  also  be  several  independent 
estimates  of  the  directon  of  translation  which  can 
be  lntearated. 

III.  THE  LOCAL  TRANSLATIONAL  DECOMPOSITION 


We  now  extend  the  trans) -  mal  case  to  less 
restricted  forms  of  sensoi  ".ion  by  applying  the 
procedure  for  determining  he  direction  of 
translational  motion  to  9  1,  overlapping  areas 
across  an  image  surface  over  ;quenoe  of  images. 
The  motivation  is  to  approx'  -«neral  motions  as 
consisting  locally  of  env'  ✓  <1  translations 
and  to  lnterpet  local  1m  -  tion  as  resulting 
from  environmental  translations.  The  feasibility 
of  this  is  based  upon  experiments  showing  that  the 
direction  of  translation  can  be  extracted  with 
reasonable  precision  using  small  image  areas 
containing  a  few  features.  The  resulting 
description  of  motion  is  an  approximation  to  the 
Environmental  Direction  of  Hotlon  Field  (EDMF) 
(section  I. A.)  which  associates  with  a  set  of  image 
points  (or  small  image  areas)  the  direction  of 
motion  of  the  corresponding  environmental  point  (or 
small  environmental  surface  area).  As  a  low  level 
representation  of  environmental  motion,  this 
considerably  simplifies  the  recovery  of  the  sensor 
motion  parameters. 

This  section  is  divided  into  three  parts.  In 
III. A.,  the  properties  of  the  EDMF  for  different 
sensor  motions  are  summarized.  The  cases 
considered  are  pure  rotational  motion;  motion 
constrained  to  an  unknown  plane;  and  arbitrary 
motion.  This  analysis  shows  how  to  recover  the 
axis  of  rotation  from  the  EDMF  for  these  cases. 

Techniques  for  computing  the  EDMF  from  image 
sequences  are  presented  in  section  III.B.  There 
are  two  cases  considered:  1)  sequences  for  which 
image  displacement  vectors  have  not  been 
determined;  and  2)  sequences  for  which  image 
displacement  vectors  have  been  determined.  In  the 
first  case,  computing  the  EDMF  also  determines 
image  displacements. 

Section  III.C.  demonstrates  the  use  of  the  local 
translational  decomposition  for  processing  image 
sequences  that  are  produced  by  sensor  motion 
constrained  to  an  unknown  plane  in  highly  textured 
environments.  There  are  indications  that  this 
processing  is  quite  robust.  We  also  note  the 
effect  of  coupling  the  EDMF  and  environmental 
rigidity  constraints  for  the  recovery  of  relative 
depth. 


III. A.  EDMF  Properties  for  Different  Types  of 
Camera  ttetlon 


Before  discussing  t.he  computation  and  use  of  the 
EDMF,  it  is  necessary  to  describe  some  of  its  basic 
properties  for  different  classes  of  motion.  To 
describe  these  properties,  it  is  useful  to  map  the 
EDMF  vectors  onto  the  direction  of  translation 
sphere.  In  section  II,  the  direction  of 
translation  sphere  was  used  as  the  domain  for  the 
error  measure.  Here  it  is  used  in  a  manner  similar 
to  a  histogram.  Each  EDMF  vector  votes  for  a 
particular  point  on  the  direction  of  translation 
sphere.  Processing  then  involves  finding  certain 
patterns  in  the  distribution  of  the  EDMF  vectors. 


III. A.  1.  Pure  Translational  Motion  of  the  Camera 


As  discussed  above,  for  translational  motion  the 
image  displacement  paths  are  straight  lines 
intersecting  at  a  point.  The  environmental 
displacement  paths  are  straight,  parallel  lines. 
All  the  vectors  in  the  EDMF  are  identical  and  map 
onto  a  single  point  on  the  Direction  of  Translation 
Sphere  corresponding  to  the  translational  axis. 


III. A .2.  ftire  Rotational  Motion  of  the  Camera 


For  pure  rotational  motion  of  the  camera,  the  image 
displacement  paths  are  conic  sections  determined  by 
the  Intersection  of  the  Image  plane  with  the  nested 
family  of  cones  aligned  with  the  axis  of  rotation 
based  at  the  origin  of  the  camera  coordinate 
system.  The  environmental  displacement  paths  are 
circles  about  the  axis  of  rotation  and  are 
contained  In  planes  perpendicular  to  it.  The  EDMF 
vectors  win  lie  upon  a  great  circle  contained  in  a 
plane  perpendicular  to  the  axis  of  rotation  when 
mapped  onto  the  direction  of  translation  sphere. 

III. A. 3.  Motion  Constrained  to  an  Unknown  Plane 


For  this  case,  the  environmental  displacement  paths 
are  circles  in  planes  perpendicular  to  the  axis  of 
rotation,  but  the  axis  does  not  necessarily  contain 
the  origin  of  the  coordinate  system  (see  the 
discussion  of  kinematics  in  chapter  1  of  [WHI44]). 
As  for  the  rotational  case,  the  EDMF  vectors  will 
lie  on  a  great  circle  in  a  plane  perpendicular  to 
the  axis  of  rotation  when  mapped  onto  the  Direction 
of  Translation  Sphere. 

III. A. 4.  Arbitrary  Motion 


For  arbitrary  motion,  the  image  displacement  paths 
cannot  be  easily  described.  But  the  environmental 
displacement  paths  are  helices  about  an  axis  which 
does  not  necessarily  contain  the  origin  (since  a 
screw  displacement  is  the  most  general  form  of  a 
rigid  body  motion  ICOX61,WHIT44]). 


The  set  of  normalized  tangent  vectors  to  a  helix, 
when  based  at  a  common  origin,  will  generate  a 
cone,  called  the  tangent  cone.  The  orientation  of 
this  cone  specifies  the  axis  of  rotation.  The  set 
of  tangent  cones  determined  by  a  rigid  body  motion 
for  all  points  in  space  will  all  have  the  same 
orientation.  Note  that  the  difference  vectors 
between  any  vectors  of  a  tangent  cone  will  lie  in  a 
plane  perpendicular  to  the  axis  of  rotation. 
Because  of  this,  the  EDMF  produced  during  arbitrary 
motion  has  a  particularly  nice  property  if  the 
rigid  body  motion  is  constant  over  two  or  more 
Intervals.  For  such  motion  there  will  be 
successive  environmental  direction  of  motion 
vectors  associated  with  each  image  point  and  the 
difference  vectors  between  these  successive  EDMF 
vectors  will  lie  in  the  same  plane,  perpendicular 
to  the  axis  of  rotation,  for  all  image  points. 


HI.  B.  Computing  the  EDMF 


III.B. 1.  From  an  Unprocessed  Image  Se quence 


The  translational  processing  procedure  described  in 
section  II  yields  a  set  of  image  displacements 
consistent  with  a  determined  translational  axis. 
Applying  this  procedure  to  a  small  area  of  an  image 
containing  extracted  features  finds  a  set  of  image 
displacements  consistent  with  Interpreting  the 
local  image  motion  as  if  it  were  produced  by  a 
translation  of  the  corresponding  part  of  the 
environment.  Note  that  where  the  translational 
approximation  is  poor  (for  example,  image  areas 
near  the  intersection  of  the  axis  of  rotation  and 
the  image  plane)  there  will  be  a  large  value  of  the 
error  measure  describing  the  validity  of  the 
translational  axis.  Thus,  the  error  measure  can 
serve  to  validate  the  approximation.  It  is  also 
necessary  to  incorporate  Information  concerning  the 
number  and  distribution  of  the  feature  points  in 
the  local  image  areas  for  this  evaluation.  For 
example,  if  there  is  only  one  feature  in  the  area 
or  the  features  are  bunched  together,  the 
translational  approximation  will  be  poor . 
Processing  is  not  applied  to  local  areas  which  do 
hot  satisfy  these  requirements. 

Figure  9a  is  a  138x128  pixel  image  of  some  grass 
texture  with  seven  bits  of  intensity.  Figure  9b 
was  derived  from  figure  9a  by  applying  a  simulated 
rotation  of  0.1  radians  about  the  Y  axis  of  the 
camera  coordinate  system  (the  focal  length  was  set 
to  one).  Features  were  selected  from  the  image  in 
figure  9a  by  first  determining  image  points  where 
the  contrast  was  greater  than  20  intensity  levels, 
and  then  finding  local  maxims  in  the 
distinctiveness  values  (section  II. C.)  associated 
with  the  5x5  pixel  square  features  centered  at 
those  points.  The  resulting  feature  positions  are 
shown  in  figure  10. 
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Using  the  translaticnal  processing  procedure,  the 
direction  of  translation  was  determined  for  11x11 
pixel  neighborhoods  centered  at  each  feature  in 
figure  10.  The  image  displacement  associated  with 
a  feature  was  that  determined  by  the  best 
translational  approximation  for  the  feature's 
neighborhood.  The  resulting  image  displacement 
field  is  shown  in  figure  11. 


Fig  9a.  Grass  Texture  Image  1 


Fig  9b. 


Grass  Texture  Image  2 


Fig  10.  Extracted  Feature  Positions 


Fig  11.  Image  Displacements 
III.B.2.  From  a  Computed  Displacement  Field 


An  error  measure  can  be  developed  to  evaluate 
translational  axes  for  image  sequences  for  which 
image  displacements  have  been  determined.  The 
error,  with  respect  to  an  Image  displacement  field, 
can  be  calculated  for  a  translational  axis  by 
summing  the  angles  between  the  image  displacement 
vectors  and  the  image  displacement  paths  from  the 
FOE  or  FOC  determined  by  the  translational  axis 
(figure  12).  Similarly,  the  sum  of  one  minus  the 
cosine  of  each  angle  could  be  used.  To  compute  the 
EDMF,  the  translational  axis  is  determined  applying 
this  error  measure  to  local  areas  of  the  computed 
displacement  field. 


It  may  be  possible  to  determine  the  EDMF  from 
sparse  image  displacement  fields  by  filling  the 
image  displacement  field  to  an  adequate  density  by 
a  smoothing  or  averaging  procedure  which  treats  the 
sparse  determined  image  displacements  as  boundary 
conditions  and  then  locally  applying  the 
translational  processing  procedure  using  the 
adapted  error  measure. 
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Fig  12.  Error  Measure 
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III. C.  Processing  the  Computed  EDMFs 


III.C.1.  Proeeaalng  Arbitrary  Planar  Motion 


For  arbitrary  planar  notion,  all  the  environmental 
displacements  are  constrained  to  lie  in  planes 
perpendicular  to  the  axis  of  rotation.  In  this 
case  four  of  the  five  recoverable  camera  parameters 
are  unconstrained:  the  axis  and  extent  of  rotation 
are  arbitrary,  and  the  translational  axis  is 
constrained  to  be  perpendicular  to  the  rotational 
axis.  When  the  ideal  EDMF  vectors  are  mapped  onto, 
the  Direction  of.  Translation  Sphere,'  they  will  lie 
on  a  great  circle  in  a  plane  perpendicular  to  the 
axis  of  rotation.  Thus,  processing  consists  of 
determining  the  EDMF  and  finding  the  best  planar 
fit  of  the  EDMF  vectors  which  also  contains  ■  the 
origin.  This  may  be  done  using  any  of  a  nimber  of 
plane  fitting  routines.  In  the  experiments  here, 
the  eigenvector  fit  procedure  described  in  CDUD73] 
pp.  332-335  is  used,  having  been  adapted  for 
planes  containing  the  origin. 

Note  that  if  the  motion  occurs  over  several 
successive  instants  and  remains  constrained  to  the 
same  plane,  then  the  vectors  in  the  successive 
EDMFs  are  also  constrained  to  lie  in  the  plane 
parallel  to  it  and  containing  the  origin.  Thus 
more  and  more  .values  for  the  fit  can  be  collected 
over  time,  thereby  increasing  the  accuracy  of  the 
processing. 

For  example,  using  the  EDMF  determined  for  the 
grass  texture  sequence  described  in  section 
III.B. 1.,  the  normal  to  the  best  plane  fit  was 
determined  to  be  ( .002518, .999893.-.0143709).  This 
is  off  by  .014592  radians  or  .836053  degrees  from 
the  correct  rotational  axis.  Figure  13  shows  a 
histogram  of  the  oomputed  EDMF  vectors  in  a  polar 
coordinate  system  (for  the  unit  vectors  (X,Y,Z)  on 
the  direction  of  translation  sphere,  Rill  = 
arctan(Y/X),  Phi2  s  arccos(Z)).  The  number  of 
vectors  at  a  particular  location  is  encoded  by 
darkness.  Note  the  orientation  in  a  plane 
perpendicular  to  the  Y  axis. 


Fig  13.  EDMF  Histogram 


Figure  14  shows  a  32x32  image  displacement  field 
produced  using  a  spherical  distribution  of 
environmental  points  about  the  Z-axls  (the  observer 
is  looking  into  the  interior  of  a  sphere)  with 
noise  modulation  added  to  the  depth  values  of  the 
points.  A  rotation  of  0.1  radians  occurs  about  an 
axis  whose  orientation  is  parallel  to  (.577350, 
.577350,  .577350)  and  positioned  at  the  back  of  the 
sphere.  The  local  direction  of  translation  was 
determined  at  all  positions  across  the  displacement 
field  for  5x5  pixel  windows,  using  the  measure 
described  in  section  III.C.2.  Using  all  the 
determined  EDMF  vectors  in  the  plane  fitting 
procedure,  the  normal  is  determined  to  be  (.647159, 
.543663,  .534429).  This  deviates  from  the  correct 
axis  by  .08863)  radians  or  5.078184  degrees. 
Figure  15  shows  the  error  values  in  the 
translational  fit  proportional  to  image  darkness. 
Note  that  the  greatest  errors  occur  where  the  image 
displacement  vectors  have  a  rotational  character. 
By  restricting  the  plane  fit  to  EDMF  vectors  which 
have  low  associated  error  values  for  the 
translational  approximation,  the  determination  of 
the  axis  of  rotation  is  Improved.  By  using  EDMF 
vectors  for  which  the  determined  error  measure  is 
less  than  90  degrees  over  the  5x5  pixel  areas,  the 
normal  is  determined  to  be  (.579462,  .583347, 
.569148).  This  deviates  by  .010380  radians  or 
.594798  degrees  from  the  correct  rotational  axis. 
Thus,  the  high  error  measure  values  have  been  used 
to  remove  the  rotational-like  displacements  in  the 
center  of  the  image. 

Once  the  axis  of  rotation  has  been  determined, 
processing  has  been  reduced  to  the  case  of  known 
planar  motion.  This  could  be  solved  directly  via 
the  suggested  adaptation  of  the  translational 
technique  to  known  planar  motion  (section  II. 0. 1.). 
Alternatively,  the  Inference  techniques  of  Prazdny 
[PRA81 ]  and  Nagel  [NAC81a ,NAG81b]  could  be  applied 
to  the  image  displacement  field  determined 
concurrently  with  the  EDMF.  In  these  techniques  a 
composite  image  displacement  field  (one  produced  by 
combined  camera  rotation  and  translation)  is 
decomposed  into  its  translational  and  rotational 
components  by  searching  through  the 
three-dimensional  space  of  rotational  parameters  to 
find  a  rotational  displacement  field  which,  when 
subtracted  from  the  composite  field,  yields  a 
translational  displacement  field.  By  having 
determined  the  axis  of  rotation  via  analysis  of  the 
EDMF,  this  search  has  been  reduced  to  a  single 
bounded  dimension  corresponding  to  the  extent  of 
rotation. 

For  the  case  of  planar  motion,  the  FOE  or  F0C  is 
further  constrained  to  lie  along  a  line  in  the 
image  plane  determined  by  the  intersection  of  the 
image  plane  and  the  plane  perpendicular  to  the  axis 
of  rotation  and  containing  the  focal  point. 
Because  of  this,  the  decomposition  procedure  is 
simplified.  When  the  correct  rotational  field  is 
subtracted  from  the  composite  field,  the  resulting 
field  should  have  an  FOE  or  F0C  along  the  line. 
Thus,  it  is  only  necessary  to  evaluate  the 
distribution  of  the  intersections  of  the  image 
displacement  vectors  resulting  from  subtraction  of 
a  hypothesized  rotational  displacement  field  with 
this  line. 


Fig  14.  Image  Displacement  Field 


Fig  15.  Translational  Approximation  Error 

Note  that  by  mapping  the  EDHF  onto  the  direction  of 
translation  sphere,  the  local  differential 
properties  of  the  EDHF  are  not  being  utilized.  We 
suspect  that  the  extent  of  rotation  can  be 
recovered,  or  at  least  strongly  constrained,  by 
analyzing  the  local  changes  in  (he  orientation  of 
the  EDHF  vectors  either  spatially  (over  a  small 
area  of  an  image)  or  temporally  (over  successive 
inter-image  Intervals).  If  this  is  so,  processing 
could  be  directly  reduced  to  the  purely 
translational  case  by  removal  of  the  determined 
rotational  component. 

Let  us  consider  the  case  where  the  parameters  of 
motion  remain  constant  over  successive  intervals. 
Here  the  angle  between  the  successive  EDHF  vectors 
associated  with  an  image  point  will  be  equal  to  the 
angle  of  rotation.  This  angle  will  be  the  same  for 
all  points  in  the  image  sequence  and  suggests  a 
potentially  robust  technique  for  determining  the 
extent  of  rotation  by  finding  the  mean  angle 
between  successive  EDHF  vectors. 


III.C.2.  Coupling  the  approximated  EDHF  and 
Rigidity  Constraints  -  - 


There  are  several  formulations  for  the  recovery  of 
environmental  depth  and  camera  motion  parameters 
based  upon  environmental  rigidity  [LAW80,  NAG81. 
HER80,  R0A80.  ULL79.  WEB81 ].  Solving  these 
constraints  is  simplified  when  information  about 
the  direction  of  environmental  motion  is 
incorporated  into  them.  In  particular,  the  number 
of  points  in  successive  frames  that  is  necessary  to 
infer  their  relative  depth  is  reduced  from  five  to 
two. 

For  two  points  in  successive  Images  there  are  four 
unknowns  to  be  recovered  corresponding  to  the 
depths  of  the  two  points  at  Instants  t  and  tel. 
One  of  the  depths  at  time  t  can  be  set  arbitrarily 
since  only  relative  depth  can  be  recovered.  Both 
depths  of  the  points  at  time  t+1  can  be  determined 
from  their  image  displacement  vectors,  their  depths 
at  time  t,  and  their  corresponding  EDHF  vectors. 
(To  see  this  (figure  16)  ,  note  that  given  1) 
sucoesive  rays  of  projection  PI  and  P2:  2)  a  depth 
D  for  the  corresponding  environmental  point  along 
PI  and  3)  the  direction  of  environmental  motion  for 
the  point  along  Pi,  the  depth  of  the  environmental 
point  along  P2  can  be  determined  by  selecting  the 
point  on  P2  that  is  closest  to  the  (dotted)  line 
determined  by  the  environmental  point  along  PI  and 
its  direction  of  motion).  Thus,  the  depth  of  one 
of  the  points  can  be  set  arbitrarily  and  the  other 
depth  determined  based  on  satisfaction  of  the 
rigidity  constraint  over  successive  instants. 

Each  point  can  thus  assign  relative  depths  to  all 
other  image  points.  This  suggests  a  consistency 
computation  wherein  agreement  between  the  relative 
depth  maps  determined  by  each  point  are  used  to 
find  a  globally  consistent  depth  map. 


Fig  16.  Depth  Inference 
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Discussion 


This  work  shows  that  if  the  EDHF  can  be  reliably 
computed.  It  is  a  very  useful  low  level 
representation  for  rigid  body  motion  analysis. 
There  are  strong  indications  that  this  is  possible 
for  densely  textured  linage  sequences  and  that 
camera  motion  parameters  can  be  recovered  for  cases 
of  motion  of  complexity  corresponding  to  motion 
constrained  to  an  unknown  plane. 

Techniques  for  processing  arbitrary  motion  have 
been  suggested  in  section  III. A. 4.  (finding  the 
best  planar  fit  to  the  difference  vectors  of 
successive  EDMF  vectors)  and  in  section  III.C.2 
(solving  rigidity  constraints  coupled  with 
information  in  the  EDHF) .  The  primary  question 
concerns  the  robustness  of  processing  when  using  a 
noisy,  approximated  EDHF  in  the  arbitrary  case. 
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